Rank | Count | Beginning |
---|---|---|
19717 | 4495 | The |
9620 | 944 | In |
27526 | 856 | We |
25438 | 852 | This |
10948 | 839 | It |
7308 | 760 | He |
8651 | 711 | I |
139 | 452 | A |
13999 | 373 | Mr |
2053 | 356 | As |
25030 | 337 | They |
18115 | 325 | She |
6615 | 284 | Full |
8967 | 262 | If |
6145 | 243 | For |
23294 | 240 | There |
27532 | 235 | “We |
26556 | 208 | To |
3372 | 206 | But |
17800 | 205 | Seychelles |
19823 | 205 | “The |
23858 | 191 | These |
29604 | 189 | You |
15776 | 181 | Our |
16633 | 173 | President |
15212 | 166 | On |
5069 | 163 | During |
542 | 159 | After |
849 | 154 | All |
28876 | 151 | When |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV